Convergence Analysis of On-Policy LSPI for Multi-Dimensional Continuous State and Action-Space MDPs and Extension with Orthogonal Polynomial Approximation
نویسندگان
چکیده
We propose an online, on-policy least-squares policy iteration (LSPI) algorithm which can be applied to infinite horizon problems with where states and controls are vector-valued and continuous. We do not use special structure such as linear, additive noise, and we assume that the expectation cannot be computed exactly. We use the concept of the post-decision state variable to eliminate the expectation inside the optimization problem. We provide a formal convergence analysis of the algorithm under the assumption that value functions are spanned by finitely many known basis functions. Furthermore, the convergence result extends to the more general case of unknown value function form using orthogonal polynomial approximation.
منابع مشابه
Representation Policy Iteration
This paper addresses a fundamental issue central to approximation methods for solving large Markov decision processes (MDPs): how to automatically learn the underlying representation for value function approximation? A novel theoretically rigorous framework is proposed that automatically generates geometrically customized orthonormal sets of basis functions, which can be used with any approxima...
متن کاملOnline exploration in least-squares policy iteration
One of the key problems in reinforcement learning is balancing exploration and exploitation. Another is learning and acting in large or even continuous Markov decision processes (MDPs), where compact function approximation has to be used. In this paper, we provide a practical solution to exploring large MDPs by integrating a powerful exploration technique, Rmax, into a state-of-the-art learning...
متن کاملConvergence Proofs of Least Squares Policy Iteration Algorithm for High-Dimensional Infinite Horizon Markov Decision Process Problems
Most of the current theory for dynamic programming algorithms focuses on finite state, finite action Markov decision problems, with a paucity of theory for the convergence of approximation algorithms with continuous states. In this paper we propose a policy iteration algorithm for infinite-horizon Markov decision problems where the state and action spaces are continuous and the expectation cann...
متن کاملContinuous-action reinforcement learning with fast policy search and adaptive basis function selection
As an important approach to solving complex sequential decision problems, reinforcement learning (RL) has been widely studied in the community of artificial intelligence and machine learning. However, the generalization ability of RL is still an open problem and it is difficult for existing RL algorithms to solve Markov decision problems (MDPs) with both continuous state and action spaces. In t...
متن کاملConstruction of approximation spaces for reinforcement learning
Linear reinforcement learning (RL) algorithms like least-squares temporal difference learning (LSTD) require basis functions that span approximation spaces of potential value functions. This article investigates methods to construct these bases from samples. We hypothesize that an ideal approximation spaces should encode diffusion distances and that slow feature analysis (SFA) constructs such s...
متن کامل